child network
HKT: A Biologically Inspired Framework for Modular Hereditary Knowledge Transfer in Neural Networks
Tchenko, Yanick Chistian, Mohr, Felix, Abdelkader, Hicham Hadj, Tabia, Hedi
A prevailing trend in neural network research suggests that model performance improves with increasing depth and capacity - often at the cost of integrability and efficiency. In this paper, we propose a strategy to optimize small, deployable models by enhancing their capabilities through structured knowledge inheritance. We introduce Hereditary Knowledge Transfer (HKT), a biologically inspired framework for modular and selective transfer of task-relevant features from a larger, pretrained parent network to a smaller child model. Unlike standard knowledge distillation, which enforces uniform imitation of teacher outputs, HKT draws inspiration from biological inheritance mechanisms - such as memory RNA transfer in planarians - to guide a multi-stage process of feature transfer. Neural network blocks are treated as functional carriers, and knowledge is transmitted through three biologically motivated components: Extraction, Transfer, and Mixture (ETM). A novel Genetic Attention (GA) mechanism governs the integration of inherited and native representations, ensuring both alignment and selectivity. We evaluate HKT across diverse vision tasks, including optical flow (Sintel, KITTI), image classification (CIFAR-10), and semantic segmentation (LiTS), demonstrating that it significantly improves child model performance while preserving its compactness. The results show that HKT consistently outperforms conventional distillation approaches, offering a general-purpose, interpretable, and scalable solution for deploying high-performance neural networks in resource-constrained environments.
- South America > Colombia > Cundinamarca Department (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > France (0.04)
- (2 more...)
Reviews: Transfer Learning with Neural AutoML
This paper applies both multi-task training and transfer learning to AutoML. The paper extends the ideas presented in the Neural Architectura Search (NAS) technique (Barret Zoph and Quoc V. Le. The authors maintain the two-layer solution, with one network "the controller" choosing the architectural parameters for the "child" network which is used to solve the targeted task. The performance of the child network is fed back to the controller network to influence its results. The novelty of this paper is in the way this two-layer solution is used.
Accelerate Intermittent Deep Inference
More recently, have to execute intermittently. Communicating with remote contemporary trends focus on making the Deep Neural servers requires significantly more energy than local computation Net (DNN) Models runnable on battery-less intermittent or sensing, which has led to the development of on-device devices. One of the approaches is to shrink the DNN models intelligence and the execution of deep neural network (DNN) by enabling weight sharing, pruning, and conducted inference on intermittent systems. Neural architecture search Neural Architecture Search (NAS) with optimized search (NAS) techniques have been developed to automatically find space to target specific edge devices [2] [8] [7] [9]. Another highly accurate neural networks that can efficiently execute approach analyzes the intermittent execution and designs on deployed systems. With the increasing demand for deployment the corresponding system by performing NAS that is aware on battery-less edge devices, intermittent-aware neural of intermittent execution cycles and resource constraints architecture search is becoming crucial. DNN inference under intermittent power requires accumulative However, the optimized NAS was only considering consecutive execution across power cycles, as ambient power execution with no power loss, and intermittent is typically unstable and too weak for continuous execution.
- North America > United States > Texas (0.05)
- North America > United States > California > Riverside County > Riverside (0.04)
Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts
Somayajula, Sai Ashish, Liang, Youwei, Singh, Abhishek, Zhang, Li, Xie, Pengtao
Pretrained Language Models (PLMs) have advanced Natural Language Processing (NLP) tasks significantly, but finetuning PLMs on low-resource datasets poses significant challenges such as instability and overfitting. Previous methods tackle these issues by finetuning a strategically chosen subnetwork on a downstream task, while keeping the remaining weights fixed to the pretrained weights. However, they rely on a suboptimal criteria for sub-network selection, leading to suboptimal solutions. To address these limitations, we propose a regularization method based on attention-guided weight mixup for finetuning PLMs. Our approach represents each network weight as a mixup of task-specific weight and pretrained weight, controlled by a learnable attention parameter, providing finer control over sub-network selection. Furthermore, we employ a bi-level optimization (BLO) based framework on two separate splits of the training dataset, improving generalization and combating overfitting. We validate the efficacy of our proposed method through extensive experiments, demonstrating its superiority over previous methods, particularly in the context of finetuning PLMs on low-resource datasets.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.67)
Stochastic Subnetwork Annealing: A Regularization Technique for Fine Tuning Pruned Subnetworks
Whitaker, Tim, Whitley, Darrell
Pruning methods have recently grown in popularity as an effective way to reduce the size and computational complexity of deep neural networks. Large numbers of parameters can be removed from trained models with little discernible loss in accuracy after a small number of continued training epochs. However, pruning too many parameters at once often causes an initial steep drop in accuracy which can undermine convergence quality. Iterative pruning approaches mitigate this by gradually removing a small number of parameters over multiple epochs. However, this can still lead to subnetworks that overfit local regions of the loss landscape. We introduce a novel and effective approach to tuning subnetworks through a regularization technique we call Stochastic Subnetwork Annealing. Instead of removing parameters in a discrete manner, we instead represent subnetworks with stochastic masks where each parameter has a probabilistic chance of being included or excluded on any given forward pass. We anneal these probabilities over time such that subnetwork structure slowly evolves as mask values become more deterministic, allowing for a smoother and more robust optimization of subnetworks at high levels of sparsity.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Colorado > Larimer County > Fort Collins (0.04)
- Asia > Middle East > Jordan (0.04)
Interpretable Diversity Analysis: Visualizing Feature Representations In Low-Cost Ensembles
Whitaker, Tim, Whitley, Darrell
Diversity is an important consideration in the construction of robust neural network ensembles. A collection of well trained models will generalize better if they are diverse in the patterns they respond to and the predictions they make. Diversity is especially important for low-cost ensemble methods because members often share network structure in order to avoid training several independent models from scratch. Diversity is traditionally analyzed by measuring differences between the outputs of models. However, this gives little insight into how knowledge representations differ between ensemble members. This paper introduces several interpretability methods that can be used to qualitatively analyze diversity. We demonstrate these techniques by comparing the diversity of feature representations between child networks using two low-cost ensemble algorithms, Snapshot Ensembles and Prune and Tune Ensembles. We use the same pre-trained parent network as a starting point for both methods which allows us to explore how feature representations evolve over time. This approach to diversity analysis can lead to valuable insights and new perspectives for how we measure and promote diversity in ensemble methods.
Learn Basic Skills and Reuse: Modularized Adaptive Neural Architecture Search (MANAS)
Chen, Hanxiong, Li, Yunqi, Zhu, He, Zhang, Yongfeng
Human intelligence is able to first learn some basic skills for solving basic problems and then assemble such basic skills into complex skills for solving complex or new problems. For example, the basic skills "dig hole," "put tree," "backfill" and "watering" compose a complex skill "plant a tree". Besides, some basic skills can be reused for solving other problems. For example, the basic skill "dig hole" not only can be used for planting a tree, but also can be used for mining treasures, building a drain, or landfilling. The ability to learn basic skills and reuse them for various tasks is very important for humans because it helps to avoid learning too many skills for solving each individual task, and makes it possible to solve a compositional number of tasks by learning just a few number of basic skills, which saves a considerable amount of memory and computation in the human brain. We believe that machine intelligence should also capture the ability of learning basic skills and reusing them by composing into complex skills. In computer science language, each basic skill is a "module", which is a reusable network of a concrete meaning and performs a specific basic operation. The modules are assembled into a bigger "model" for doing a more complex task. The assembling procedure is adaptive to the input or task, i.e., for a given task, the modules should be assembled into the best model for solving the task. As a result, different inputs or tasks could have different assembled models, which enables Auto-Assembling AI (AAAI). In this work, we propose Modularized Adaptive Neural Architecture Search (MANAS) to demonstrate the above idea. Experiments on different datasets show that the adaptive architecture assembled by MANAS outperforms static global architectures. Further experiments and empirical analysis provide insights to the effectiveness of MANAS.
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
Xu, Runxin, Luo, Fuli, Zhang, Zhiyuan, Tan, Chuanqi, Chang, Baobao, Huang, Songfang, Huang, Fei
Recent pretrained language models extend from millions to billions of parameters. Thus the need to fine-tune an extremely large pretrained model with a limited training corpus arises in various downstream tasks. In this paper, we propose a straightforward yet effective fine-tuning technique, Child-Tuning, which updates a subset of parameters (called child network) of large pretrained models via strategically masking out the gradients of the non-child network during the backward process. Experiments on various downstream tasks in GLUE benchmark show that Child-Tuning consistently outperforms the vanilla fine-tuning by 1.5~8.6 average score among four different pretrained models, and surpasses the prior fine-tuning techniques by 0.6~1.3 points. Furthermore, empirical results on domain transfer and task transfer show that Child-Tuning can obtain better generalization performance by large margins.
- Asia > China (0.04)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Poisoning the Search Space in Neural Architecture Search
Wu, Robert, Saxena, Nayan, Jain, Rohan
Deep learning has proven to be a highly effective problem-solving tool for object detection and image segmentation across various domains such as healthcare and autonomous driving. At the heart of this performance lies neural architecture design which relies heavily on domain knowledge and prior experience on the researchers' behalf. More recently, this process of finding the most optimal architectures, given an initial search space of possible operations, was automated by Neural Architecture Search (NAS). In this paper, we evaluate the robustness of one such algorithm known as Efficient NAS (ENAS) against data agnostic poisoning attacks on the original search space with carefully designed ineffective operations. By evaluating algorithm performance on the CIFAR-10 dataset, we empirically demonstrate how our novel search space poisoning (SSP) approach and multiple-instance poisoning attacks exploit design flaws in the ENAS controller to result in inflated prediction error rates for child networks. Our results provide insights into the challenges to surmount in using NAS for more adversarially robust architecture search.
- Health & Medicine (0.68)
- Information Technology (0.49)